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Abstract 

Given a density < d < 1, we show for all sufficiently large primes 
p that if 5 C Z/pZ has the least number of three-term arithmetic 
progressions among all such sets having > dp elements, then S must 
contain an arithmetic progression of length at least log^^''"''"'"'^' p. 

1 Introduction 

Given a prime p, we say that S C TLjpL is a critical set for the density d if 
and only if 1 5*1 > dp and S has the least number of three-term arithmetic 
progressions among all the subsets of TLj^TL having at least dp elements. In 
this context, a three-term arithemtic progression is a triple of residue classes 
n,n + m,n + 2m modulo p. Note that this includes "trivial" progressions, 
which are ones where m = (mod p), as well as "non-trivial" progressions, 
which are ones where m ^ (mod p). We also distinguish two different 
progressions, according to how they are ordered: The progression n, n + 
m,n + 2m is considered different from n + 2m, n-\- m,n. 

The main result of the paper is the following theorem, which basically 
says that critical sets of positive density must have long arithemtic progres- 
sions. 

Theorem 1 Given < d < 1 we have that the following holds for all 
sufficiently large prime numbers p: If S CI TLjpTL is a critical set for the 
density d, then S must contain an arithmetic progression modulo p of length 
at least \og^'^+°^^^ p. 
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Moreover, we show that for every L > 1, < d < 1, and p sufficiently 
large, there exists an arithmetic progression N C 'Ljp'L having length at least 



We now compare this theorem with the state-of-the-art on long progres- 
sions in arbitrary sets of integers: As a consequence of W. T. Gowers's deep 
and beautiful proof of Szemeredi's Theorem Theorem 18.6], one can show 
that for < 5 < 1, and all x sufficiently large, any set S C {1, 2, x} hav- 
ing at least 5x elements contains an arithmetic progression of length at least 
log log log log log(x) + c{5), for some constant c{6). This is a considerably 
shorter AP than the one given for critical sets in our theorem above. 

There are also some results for sumsets, which give much longer AP's. 
For example, J. Bourgain proved the interesting result that ii A,B C 
where \A\ > 6x, \B\ > 'jx, then A + B has an arithmetic pro- 
gression of length at least exp(c((57 log x)^/^ — log log x) (for some c > 0). 
I. Ruzsa ini gave an ingenious construction, which shows that for every 
< e < 1/3, and all x sufficiently large, there exists a set A having at least 
b{e)x elements (for some fuction 6(e) > that depends only on e), such that 
A + ^ has no arithmetic progressions longer than exp(log^''^~'^ x). Then, B. 
Green |lj improved Bourgain's result, and showed that a sumset A + B has 
an arithmetic progression of length at least exp(c'(57 log x)^/^ — log log x). 
We note that the length of the progressions in these sumsets is much longer 
than the ones we give for critical sets; and so, if we could somehow prove 
that critical sets are sumsets of two large sets A and B, then our result could 
possibly be improved. 

There are also some impressive results on long arithmetic progressions in 
repeated sumsets A + A + ■ ■ ■ + A and subset sums, notably those of Freiman 
0; Sarkozy [101, HH, and [H]; Lev 0, and [3; Vu and Szemeredi [H] and 
[15] : and J. Solymosi [T3] . 

Comments: The method of proof of our theorem has many common 
features with the result of B. Green [3]. In particular, we both make use 
of large deviation (or concentration of measure) results from probability 
theory; and we both use techniques involving Bohr neighborhoods. However, 
the combinatorial aspects of our two theorems are different, which reflects 
the fact that sumsets and critical sets have different properties that must 
be exploited in different ways. 



log^p, such that 
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It is possible to refine the proof of our theorem, to show that critical sets 
have AP's whose length depends on the density d; so, for example, it might 
be possible to prove that critical sets S C Z/pZ of density d have a long AP 
for any d > (loglogp)"^; and, if one applies Chang's Structure Theorem, as 
Green does, one can maybe get an even better result (longer AP's holding 
for lower densities d). 

2 Proof of Theorem [T] 



We identify S with the indicator function S{n), which is defined as follows: 

S{n) = 

Next, we define the discrete Fourier transform of S{n) to be 



1, if n G S; 
0, otherwise. 



S{a) = 5(n)e2™"/P. 

0<n<p-l 

Then, we have that the number of 3-term arithemtic progressions in the set 
S is given by 

S{r)Sis)S{t) = ^ E S{afS{-2a). 

r+s=2t (mod p) 0<a<p-l 

We now write this last sum as Si + S2, where Si is the sum over all 
those a where 

^ P log log p 

\S{-2a)\ > — , (1) 

ylogp 

and where S2 is the sum over the remaining values of a. From Parseval's 
identity we deduce the estimate 

I ploglogp sr^ 2 ^ dp^loglogp 

1^2! < — r, }^\S[a)\ < , (2) 

Vlogp ^ Vlogp 

where the condition (*) is that we sum over all < a < p — 1 that do not 
satisfy 

We now bound the number of terms in Si from above: Denote this 
number of terms by M. Then, by Parseval's identity we get that 

P^(loglogp)^ ^ ^ ^ |^(^),2 ^ ^2 

\ogp I V /I 
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which imphes 

dlogp 
(log logp) 

We next require the following basic lemma: 



Lemma 1 Suppose that K > 1, < ai, ...,ak < p — 1 and 

logp 



k < 



2K log log p ' 



Then, for p sufficiently large there is at least one integer 1 < n < p— 1 lying 
in the Bohr neighborhood defined by 



For all i = 1,2, k, 



< (4) 

log p 



p 

where \ \x\\ denote the distance from x to the nearest integer. 

Proof of the Lemma. This is nothing more than Dirichlet's pigeonhole 
argument: We consider the p vectors lying in the unit A;-cube 

[aiy/p (mod l),...,afcy/p (modi)), 

where y runs through the integers 0, 1, ...,p — 1. Now, by the pigeonhole 
principle, there must exist two values of y, say yi and y2, such that ||aj(yi — 
y2)/p|| < 1/log^p. ■ 

Let {ai, ...,ak} be the values of a satisfying which are the indices 
of the terms in Si. Then, we apply Lemma H with K = 2L, and deduce 
that there is an integer no satisfying @. Now, let N be the arithemtic 
progression 

= {jno (mod p) : < j < log^p}. 
We identify with its scaled indicator function 

^ ^ 1 0, ifn^iV. 

Then, we define the Fourier transform of this scaled indicator function: 

p-i 

N{a) = ^iV(n)e2™"/P. 

n=0 
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We now consider the convolution 

a+b=m (mod p) n£N 

It is obvious that 

< {S*N){m) < 1. 
And, we have the following basic fact 

Lemma 2 Suppose that 

1 



1 > e > 



If {S*N){m) > 1 — e, for some < m < p—1, then S contains an arithmetic 
progression of length at least e~^. 

Proof of the Lemma. If {S * N){m) > 1 — e, then we are saying that 
the set S contains all but e log^ p of the residues 

m, m — uq, m — 2nQ, m — [log^pjno (mod p). (5) 

Clearly, then, S will contain an AP of length at least for e > 1/log^p. 
■ 

We will now show that if S is a critical set, then 

IQ ATM \ ^ 1 loglogp , . 

{S*N){m) > 1 - , , (6) 
log ' p 

for some m; and so, our theorem will follow from Lemma |3 

To show that this is the case, suppose, for proof by contradiction, that 
© fails to hold for every < m < p — 1; and, let 

log \ogp 

log^^^p' 0<m<p- 

Then, define the weighting function w{m) for 0<m<j5 — Itobe 

w{m) = K~^(5'*iV)(m). 

Clearly, 

< w{m) < 1; 
Now we need the following lemma: 



K = max ( 1 '^-tJa — ' "Oiayi \{S * N){m)\ 

f ' n 0<m<p— 1 
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Lemma 3 Suppose that w{m) is a real-valued function supported on the 
integers in [0,p— 1], satisfying < w{m) < 1. Then, there exists a function 
u{m), also supported on the integers in [0,p— 1], such that 

1. u{in) £ {0, 1} for all in = 0,1, ...,p — 1; 

2. u{a) = 'w{a) + O {{log p)^); and, 

3. u(0) = wlo) + 5, where < 5 < 1. 

Before we can prove this lemma we require the following concentration 
of measure result due to Hoeffding (also see [8_, Theorem 5.7): 

Proposition 1 Suppose that vi,...,Vr is a sequence of independent random 
variables where \vi\ < 1. Let 

V = E{viA + Vr) = E{vi) + --- + E{vr), 

and letTi = vi + ■ ■ ■ + Vr- Then, 

P{\T.-v\ > rt) < 4exp(-rt2/2). 

Remark: A stronger result is possible here, using Hoeffding's theorem. 
The result here is obtained as follows: Write Vi = Xi + iyi, where — 1 < 
Xi,yi < 1, and then observe that the if the "bad event" |S — z/| > rt occurs, 
then either we have the "bad event" — z^^.] > rtj^f^ or the "bad event" 
— Vy\ > rt/\/2, where T,x = xi + ■ ■ ■ + Xr and Ux = E{xi + • • • + Xr), 
and where and Vy are defined analogously. Using Hoeffding's theorem, 
the probability that either of these last two bad events occuring is at most 
4exp(— rt^/2), as in the proposition above. 

Proof of the Lemma. We will let u'{m) be a sequence of independent 
Bernoulli random variables with distribution 

P{u'{m) = 1) = w{m). 

We note that 

E{u'{m)) = w{m). (7) 

Then, for each integer a satisfying < a < p — 1, we have that the 
Fourier transform 

p-i 

u'{a) = Y,u'{j)e^^'^'^'^ 

j=0 

can be interpreted as a sum of independent random variables as follows 
u'{a) = uoH hvp-i, where vj = u' {j)e^'^^^"-^P . 
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Now, 

p-i 

E{u'{a)) = E{vo) + ■ ■ ■ + E{vp.i) = ^ i?K(j))e2-^'»/f = w{a). 

3=0 

Applying the Hoeffding proposition above, we deduce that 

P{\u'{a)-w{a)\>{logp)^) < 4exp(-(log2^,)/2). 

Thus, the probabiHty that 

For all a = 0, 1, - 1, \u'{a)-w{a)\ < (logp)^ (8) 

is at least 

1-Ap exp(-(log2p)/2) , 

which is positive for p > 11. 

Since © holds with positive probability, there must exist a function 
u{m), supported on 0, 1, ...,p — 1, taking the values and 1, and such that 

For all a = 0, 1, — 1, \u{a) — 'w{a)\ < (logp)^/p. 

Then, by reassigning at most 0((logp)^) of the M(m)'s to or 1 as needed, 
we can get 

u{0) = w{0)+S, 0<S <1, 

while maintaining 

u(a) = w (a) + O {{log p)^/p) 

for all the other values a = 1,2,..., p — 1. Thus, we have constructed a 
function u{m) which satisfies the conclusion of our lemma. ■ 

Now let S' denote the set for which u{m) is the indicator function. Then, 
we have that 

l^'l = Hi-^\S\ + 6, where 0<6 <1. 

We now estimate the number of SAP's contained in S' modulo p: This 
number is 

-Y,u{a)M-'^a) = -5^(u'(a)+0((logp)v^))2(ii(-2a) + 0((logp)^)) 

= -^w{afw{-2a) + E, (9) 

P a=Q 
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where 
E = O 



VP 



"^{log^ p)p+ {\ogp)^/p\w{a)\ + \w{a)\^ + \w{a)w{-2a)\ . 



a=0 



Using the Cauchy-Schwarz inequahty, in combination with Parseval's iden- 
tity, one can show that 

E = 0((log3p)pvp); 

and so it fohows that the number of SAP's in S' modulo p is 

p-i 

p 



-^u;(a)2^I;(-2a) + O ((log^p^^p) 



a=0 



1 ^"^ 

-Y,S{afS{-2a)N\a)N{-2a) + OHlog' p)p^). 



K-^P 



a=Q 



(10) 



We now break this last sum into the two sums S'^ + Sg, where S'^ is over 
those < a < p — 1 satisfying and S2 is the sum for the remaining 
values of a. Now, for each a satisfying and for each n £ N we have from 
(gj with K = 2L that 



-2an 



< 2 



an 



< 



log^ p 



for p sufficiently large. The same estimate holds for the distance from an/p 
to the nearest integer. Thus, 



N{-2a) 



^ ST^ 2m(-2an)/p 

\N\^' 



n<=N 



n<=N 



log P 



1 + 



log P 



and, the same estimate holds for N{a). Thus, we conclude that 

n.3 



S; = Si + O 



p" 



\og^ p 
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We also have the estimate 



iw| ^ Ploglogp"^, -,2 . dp'^loglogp 
IS2I < n l^\S(a)\ < , 

{*) 

where (*) represents the condition that < a < p — 1 such that a does not 
satisfy We note that the inequality here follows from Parseval's identity. 

Combining our estimate for T,[ and Eg together with (jTU]). we deduce 
that the number of SAP's in S' modulo p is 

4- (e; + Ey + 0((l„g»)pv5) = ^ (E, + E,) + O (^J^J^ 

K'^p K'^p \ K'^^/[ogp 

Thus, 

#(3AP's in S' (mod p)) = ^ x #(3AP's in S (mod p)) 

^O(d^lo0gp\ ^^^^ 

We now proceed to show that this is impossible, and from our chain of 
reasoning above, this would mean that (jB)) holds, and therefore the theorem 
would follow from Lemma [2 

To show the above equation cannot hold, we require the following com- 
binatorial lemma, which is proved using the probabilistic method, in com- 
bination with the second moment method: 

Lemma 4 Suppose A,B C Z/pZ have densities 7 and 5, respectively; and, 
suppose that A and B contain a'X'p^ and 135'^ p^ non-trivial SAP's, respec- 
tively. Then, there exists a subset C ofL/pX, having density at least 

l5 + 0{p-^l^), 

such that the number of non-trivial SAP's lying in C modulo p is at most 

a(3{^5fp^ + 0(p3/2). 

Remark. The same result holds if we add in trivial AP's, since a subset D 
of Ij/plj can have only 0{p) SAP's, which is well within the error 

Proof of Lemma |1J We will find a pair of integers n, v such that 
{uB-\-v) has the desired properties. First, we show that this intersection 
has density very close to ^5 for almost all < u,v < p — 1, by using a 
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second moment argument: We suppose that u and v are random variables 
chosen independently from {0, ...,p — 1} with the uniform measure. Then, 
the variance V{\A n {uB + v)\) is 

E{\An{uB + v)\'^) - E{\An{uB + v)\f. 

To compute the first expectation we express the intersection as a sum of 
indicator functions: 

\Ar]{uB + v)\ = ^f{ub + v), 

beB 

where / is the indicator function for the set A. So, we have that 
E{\An{uB + v)f) = E{f{ub + v)f{ub' + v)) 

b,b'eB 

= ^ E E f{ub+v)f{ub'+v). 

^ b,b' eB 0<u,v<p-l 

Now, given a pair of unequal elements b, b' G B, and any two elements 
a,a' E A (a may equal a'), there is exactly one pair of numbers u, v (mod p) 
which make ub + v = a (mod p) and ub' + v = a' (mod p). That is, 
we have that if b' 7^ b, then there arc exactly \A\'^ pairs u,v which make 
f{ub + v)f{ub' + v) ^ (and therefore equal to 1). Thus, 

E{\An{uB + v)f) < 72(5V + \B\. 

The term \B\ comes from those pairs b, b' with b = b'. 

To estimate E{\A D {uB + v)\), we note that for a fixed beB and 
< u < p — 1, the probability that ub + v lies in A is 7. Thus, the expected 
size of this intersection is 'ySp. 

We now conclude that 

V{\An{uB + v)\) < \B\ = dp; 
and so, by an application of Chebychev's inequality we conclude that 

P{\An{uB + V)\<{l-eh6p) < 

Next, we compute the expected number of SAP's in the intersection 
A n (uB + v): Let Q = Q{u,v) be the number of non-trivial SAP's lying 
in ^ n {uB + v). Now, suppoe that xi,X2, is a non-trivial SAP in A, so 
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that X2 = xi + d,X3 = xi + 2d (mod p), for some d ^ (mod p); and, 
suppose that yi,y2,y3 is a non-trivial SAP in B. Then, there is exactly one 
pair Q < u,v < p — 1 such that 

For i = 1,2, 3, uxi + v = yi (mod p). 

Thus, for u,v chosen at random from 0, ...,p — 1 with uniform probability, 
the probability that a particular non-trivial SAP lies in uB+v is 135^; and so, 
the expected size of Q is a/3(7(5)^p^. So, there can be at most p^ —^3/2 q£ ^.j^^ 
choices for u and v such that the intersection has more than al3{'^5)^(jp' + 
2p3/2^ SAP's; else, if all but p^/'^ of the choices give more than this many 
SAP's in this interesection, then we would have that Q exceeds 

(p2_p3/2)(p2^2p3/2) 
2 "P(70) ) 

which we know is not the case. Thus, the probability that Q < af3{'^5)'^{p'^ + 
2p3/2^ is > p^-*^/^. So, for e = p~-^/^7~-^5~^/^, we get that the events 

\Ar\{uB + v)\ > {l-e)-i5p and Q < af3{j5f {p"^ + 2p^/'^) 

occur with positive probability. So, there is a choice for u and v so that 
both these events occur, which proves the lemma. ■ 

We require one more lemma before we can prove that is impossible: 

Lemma 5 Given < 6 < 1, there exists a subset U C Z/pZ having density 
1 — 9 + 0{\/p) such that the number of SAP's lying in U , both trivial and 
non-trivial, is at most 

p"^ {I -36 + 2.59^). 
For < ^ < 1/S this quantity is at most 

p\l-ef {1-6^2). 

Proof. First, we claim that the sum of the number of SAP's (trivial and 
non-trivial) lying in U and lying in U = (Z/pZ) \U is 

p^{l-39 + 39^). (12) 

This follows by inclusion-exclusion: The number of SAP's lying in U is 
xi — X2 + xs — X4, where xi is the total number of SAP's among the residue 
classes modulo p'^; X2 is the sum of the number of these SAP's x,x-'rd,x + 2d 
such that x E U, summed with the number where x + d £ U , and summed 
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with the number where x + 2d £ U; X3 is the sum of the number of such 
progressions with x, x + d G U, then x,x-\-2d £ U, and finahy summed with 
the number where x + d,x + 2d £ U; finally, X4 is the number of progressions 
in U. It is easy to see that xi = p^, X2 = 3rp^, and X3 = 3r^p^. So, sum of 
the number of SAP's in U and U equals the expression in 

Now consider the set U, having density 9 + 0{l/p), given as follows: 

U := [0,ep/2] U [p/2,p/2 + ep/2], 

where here we take the integers in these two intervals (since U and U are 
sets of residue classes modulo p). Call the first interval Iq, and the second 
Ii. If x,y G li, then z = 2~^(x + y) (mod p) lies in /j if x, y have the same 
parity, and lies in if they are of different parity. This gives that the 
number of SAP's x, x + d,x + 2d is at least the number of ordered pairs x, y 
where both x,y £ Iq or both are in Ii. So, the number of SAP's in U is at 
least O'^p^ /2, and it follows that the number of SAP's in U is at most 

p^il-W + W"^ -9f/2) = /(I -30 + 2.5^2), 

which proves the lemma. ■ 

Now we let ^ = 1 — k, and let U be the set given by this lemma. Then, 
we apply Lemma 0] with A = U, and B = S' , and we deduce that there is a 
set C with 

\C\ = |5|+0(//^), 
such that C contains at most 

^3 ^ _ (l-^)' A / #(SAP^s in S) ^ ^ ^loglogp 



Vlogp 

#(SAP's in 5) ( 1 - ll^g;^ + O 



2^/^ogp V \/iogp 

To show that this is impossible for sufficiently large p, we let C be any 
set gotten from C by adding or removing at most 0{p^^^) elements such 
that 

\C'\ = \S\. 

Then, in the worst case, each element we add to C (to produce C") adds at 
most p new SAP's. Thus, 

#{3AP's in C) = #{3AP's in C) - 0{p^-^^) 

'log logp^ 



/ 4l^o^T3f ■ Q\f, loglogp^ 

< #(SAPsm5) 1 +0 

\ 2Vlogp 



\/logp 

(IS) 
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To get this inequality we have used a corollary of the following theorem of 
Varnavides which allows us to absorb the error term 0{p^''^^) into the 
error O ( (log log p) / Vlog p) ■ 

Theorem 2 Given < a < 1, there exists < c < 1 such that for any set 
T C {1, 2, x} having \T\ > ax, 

#{a,b,ceT : a + b = 2c) > cx^ . 

This corollary is: 

Corollary 1 There exists < c < 1, depending only on d (the lower hound 
for the density of S ), such that 

#(3AP's in S) > cp^. 

The proof of this corollay is immediate, since if we think of S as a set 
of integers, say S C {0, — 1} (instead of as a set of residue classes 

modulo p), then every solution to a + b = 2c, a, b,c £ S in the integers gives 
a solution a + b = 2c (mod p). So, the number of SAP's in S modulo p is 
at least the number of SAP's in S, when we think of it as a subset of the 
integers. 

Now, pSj) contradicts the fact that 5 is a critical set: Here we have 
constructed a set C having the same cardinality as the set S, but where C 
has fewer SAP's than S. Thus, we must conclude that 

/ N ^ T loglogp 
log ' p 

for some < n < p — 1, and the theorem is proved. ■ 
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